AITopics | data structure

Privacy Reasoning in Ambiguous Contexts

Neural Information Processing SystemsJun-23-2026, 03:11:14 GMT

We study the ability of language models to reason about appropriate information disclosure--a central aspect of the evolving field of agentic privacy. Whereas previous works have focused on evaluating a model's ability to align with human decisions, we examine the role of ambiguity and missing context on model performance when making information-sharing decisions. We identify context ambiguity as a crucial barrier for high performance in privacy assessments. By designing Camber, a framework for context disambiguation, we show that model-generated decision rationales can reveal ambiguities and that systematically disambiguating context based on these rationales leads to significant accuracy improvements (up to 13.3% in precision and up to 22.3% in recall) as well as reductions in prompt sensitivity. Overall, our results indicate that approaches for context disambiguation are a promising way forward to enhance agentic privacy reasoning.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country: North America > Mexico (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)
Education (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Structural Complexity of Matrix-Vector Multiplication

Neural Information Processing SystemsJun-16-2026, 05:22:06 GMT

We consider the problem of preprocessing an n n matrix M, and supporting queries that, for any vector v, returns the matrix-vector product Mv. This problem has been extensively studied in both theory and practice: on one side, practitioners have developed algorithms that are highly efficient in practice, whereas on the other side, theoreticians have proven that the problem cannot be solved faster than naive multiplication in the worst-case. This lower bound holds even in the average-case, implying that existing average-case analyses cannot explain this gap between theory and practice. Hence, we study the problem for structured matrices. We show that for n n Boolean matrices of VC-dimension d, the matrix-vector multiplication problem can be solved with eO(n2)preprocessing and eO(n2 1/d) query time.

artificial intelligence, machine learning, vc-dimension, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Discovering Data Structures: Nearest Neighbor Search and Beyond

Neural Information Processing SystemsJun-10-2026, 11:05:33 GMT

We explore if it is possible to learn data structures end-to-end with neural networks, with a focus on the problem of nearest-neighbor (NN) search. We introduce a framework for data structure discovery, which adapts to the underlying data distribution and provides fine-grained control over query and space complexity. Crucially, the data structure is learned from scratch, and does not require careful initialization or seeding with candidate data structures. In several settings, we are able to reverse-engineer the learned data structures and query algorithms. For 1D nearest neighbor search, the model discovers optimal distribution (in)dependent algorithms such as binary search and variants of interpolation search. In higher dimensions, the model learns solutions that resemble k-d trees in some regimes, while in others, elements of locality-sensitive hashing emerge. Additionally, the model learns useful representations of high-dimensional data such as images and exploits them to design effective data structures. Beyond NN search, we believe the framework could be a powerful tool for data structure discovery for other problems and adapt our framework to the problem of estimating frequencies over a data stream. To encourage future work in this direction, we conclude with a discussion on some of the opportunities and remaining challenges of learning data structures end-to-end.

artificial intelligence, data structure, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models

Nicoletti, Flavio, Ma, Chenxiao, Ventura, Enrico, Saglietti, Luca, Mannelli, Stefano Sarao

arXiv.org Machine LearningMay-8-2026

Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transition from an initial phase of generalization to memorizing the training set, existing theory assumes homogeneous data, leaving open how class imbalance and heterogeneity reshape these dynamics. In this work, we develop a high-dimensional analytical framework to study class-dependent learning in score-based diffusion models. Analyzing a random-features model trained on Gaussian mixtures, we derive the feature-covariance spectrum to characterize per-class generalization and memorization times. We reveal the explicit hierarchy governing these dynamics: class variance is the primary determinant of learning order-consistently favoring higher-variance classes-while centroid geometry plays a secondary role. Sampling imbalance acts as a modulator that can reverse this ordering and, under strong imbalance, forces minority classes to acquire distinct, delayed speciation times during backward diffusion. Together, these results suggest that diffusion models can memorize some classes while others remain insufficiently learned. We validate our theoretical predictions empirically using U-Net models trained on Fashion MNIST.

artificial intelligence, data structure, machine learning, (14 more...)

arXiv.org Machine Learning

2605.06367

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

Neural Information Processing SystemsApr-29-2026, 20:34:58 GMT

Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its "slow preprocessing" version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded "intrinsic" dimension. For the other data structure variants studied, including DiskANN with "fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a "reasonable" accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least 0.1n steps on instances of size nbefore it encounters any of the 5nearest neighbors of the query.

artificial intelligence, information retrieval, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

47a5feca4ce02883a5643e295c7ce6cd-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 17:12:27 GMT

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Faster Query Times for Fully Dynamic k-Center Clustering with Outliers

Neural Information Processing SystemsApr-25-2026, 15:32:12 GMT

Given a point set P M from a metric space (M,d)and numbers k,z N, the metric k-center problem with z outliers is to find a set C P of k points such that the maximum distance of all but at most z outlier points of P to their nearest center in C is minimized. We consider this problem in the fully dynamic model, i.e., under insertions and deletions of points, for the case that the metric space has a bounded doubling dimension dim. We utilize a hierarchical data structure to maintain the points and their neighborhoods, which enables us to efficiently find the clusters. In particular, our data structure can be queried at any time to generate a (3 + ε)-approximate solution for input values of k and z in worst-case query time ε O(dim)klognloglog, where is the ratio between the maximum and minimum distance between two points in P. Moreover, it allows insertion/deletion of a point in worst-case update time ε O(dim) lognlog . Our result achieves a significantly faster query time with respect to k and z than the current state-of-theart by Pellizzoni, Pietracaprina, and Pucci [18], which uses ε O(dim)(k+z)2 log query time to obtain a (3+ε)-approximate solution.

artificial intelligence, dim, machine learning, (19 more...)

Neural Information Processing Systems

Country: